Tight Lower Bounds for Multi-pass Stream Computation Via Pass Elimination

نویسندگان

  • Sudipto Guha
  • Andrew McGregor
چکیده

There is a natural relationship between lower bounds in the multi-pass stream model and lower bounds in multi-round communication. However, this connection is less understood than the connection between single-pass streams and one-way communication. In this paper, we consider data-stream problems for which reductions from natural multi-round communication problems do not yield tight bounds or do not apply. While lower bounds are known for some of these data-stream problems, many of these only apply to deterministic or comparison-based algorithms, whereas the lower bounds we present apply to any (possibly randomized) algorithms. Our results are particularly relevant to evaluating functions that are dependent on the ordering of the stream, such as the longest increasing subsequence and a variant of tree pointer jumping in which pointers are revealed according to a post-order traversal. Our approach is based on establishing “pass-elimination” type results that are analogous to the round-elimination results of Milterson et al. [23] and Sen [29]. We demonstrate our approach by proving tight bounds for a range of data-stream problems including finding the longest increasing sequences (a problem that has recently become very popular [22, 16, 30, 15, 12] and we resolve an open question of [30]), constructing convex hulls and fixed-dimensional linear programming (generalizing results of [8] to randomized algorithms), and the ”greater-than” problem (improving results of [9]). These results will also clarify one of the main messages of our work: sometimes it is necessary to prove lower bounds directly for stream computation rather than proving a lower bound for a communication problem and then constructing a reduction to a data-stream problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lower Bounds for Multi-Pass Processing of Multiple Data Streams

This paper gives a brief overview of computation models for data stream processing, and it introduces a new model for multi-pass processing of multiple streams, the so-called mp2s-automata. Two algorithms for solving the set disjointness problem with these automata are presented. The main technical contribution of this paper is the proof of a lower bound on the size of memory and the number of ...

متن کامل

Lower Bounds for Quantile Estimation in Random-Order and Multi-pass Streaming

We present lower bounds on the space required to estimate the quantiles of a stream of numerical values. Quantile estimation is perhaps the most studied problem in the data stream model and it is relatively well understood in the basic single-pass data stream model in which the values are ordered adversarially. Natural extensions of this basic model include the random-order model in which the v...

متن کامل

Towards Tighter Space Bounds for Counting Triangles and Other Substructures in Graph Streams

We revisit the much-studied problem of space-efficiently estimating the number of triangles in a graph stream, and extensions of this problem to counting fixed-sized cliques and cycles. For the important special case of counting triangles, we give a 4-pass, (1± ε)-approximate, randomized algorithm using Õ(ε−2 m3/2/T ) space, where m is the number of edges and T is a promised lower bound on the ...

متن کامل

Pass-Efficient Algorithms for Learning Mixtures of Uniform Distributions

Abstract. We present multiple pass streaming algorithms for a basic statistical clustering problem for massive data sets. If our algorithm is allotted 2l passes, it will produce an approximation with error at most ǫ using Õ(k3/ǫ2/l) bits of memory, the most critical resource for streaming computation. We demonstrate that this tradeoff between passes and memory allotted is intrinsic to the probl...

متن کامل

Approximating the Longest Increasing Sequence and Distance from Sortedness in a Data Stream

We revisit the well-studied problem of estimating the sortedness of a data stream. We study the complementary problems of estimating the edit distance from sortedness (Ulam distance) and estimating the length of the longest increasing sequence (LIS). We present the first sub-linear space algorithms for these problems in the data stream model. • We give a O(log n) space, one-pass randomized algo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008